Support validation of collection params with record types by bentsherman · Pull Request #7025 · nextflow-io/nextflow

bentsherman · 2026-04-10T21:39:25Z

This PR extends the support for loading a collection-type param from a CSV/JSON/YAML file to also validate and convert against a record type when used.

Notable changes:

Add ADR for workflow params (with appendix on runtime type analysis)
Compile params block as a hidden class to preserve runtime type information
Validate and convert collection elements against record type when specified
Add custom @Nullable annotation to model nullable record fields at runtime
Extend Types helper class to render parameterized types

netlify · 2026-04-10T21:39:32Z

✅ Deploy Preview for nextflow-docs-staging ready!

Name	Link
🔨 Latest commit	`c3b1ed9`
🔍 Latest deploy log	https://app.netlify.com/projects/nextflow-docs-staging/deploys/69f108094361700008066088
😎 Deploy Preview	https://deploy-preview-7025--nextflow-docs-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

jorgee · 2026-04-14T10:11:09Z

Suggestion from AI review:

1. Null-safety bug in TypeHelper.asType for nullable Path fields

In modules/nf-commons/src/main/nextflow/util/TypeHelper.groovy, if a record has a nullable Path field (e.g., fastq_2: Path?) and the input value is null, then value.toString() at line ~99 will throw a
NullPointerException. The validation loop in asRecordType skips nullable fields, but the conversion loop still calls asType(entry.value, ...) when entry.value is null. Needs a null guard at the top of asType.

2. No unit tests for TypeHelper conversion methods

getRawType, isCollectionType, asType, asCollectionType, and asRecordType have no dedicated unit tests — only tested indirectly through ParamsDslTest. Given these methods handle type reflection and conversion (prone
to subtle bugs), they should have their own tests covering edge cases: null values, nested parameterized types, unknown types, etc.

3. No test for @nullable annotation propagation

There's no test verifying that a ?-suffixed record field actually gets the @nullable annotation at runtime, or that asRecordType correctly allows missing values for nullable fields. Only the rejection path
(non-nullable missing field) is partially covered.

jorgee · 2026-04-14T10:36:16Z

+
+- Replacing `nextflow_schema.json` -- while the `params` block addresses many of the same needs, existing pipelines that use a JSON Schema should not be required to migrate. A native integration with `nextflow_schema.json` can be explored in the future.
+
+- Supporting nested params -- the `params` block only supports a flat list of params. Nested params can still be used in the config, but they do not have first-class support at this time.


Record types are considered nested params, right? I mean , can I define a param as a Record type such as the following?:

params { meta : Metadata } record Metadata { id: String .... }

yes that would be the correct way to support a "nested" param, although I prefer to not call a record param a nested param because they have slightly different semantics

I have tried to run a workflow with a record type defined as paramer as above but I can't find the way to pass the value. I have tried with the following:

nextflow run main.nf --meta.id='hola'

But getting the following error. It seems that a Map is not assignable to a Record type.

Parameter meta with type Metadata cannot be assigned to [id:hola] [LinkedHashMap]

am I doing it correctly? Is it out of the scope of the PR? or should it work?

If it is out of the scope, it is strange that a list records is supported but not a single one.

Thinking more about it, I think it's out of scope. The main goal here is to support collections of records. We can looking into supporting a basic "record" param, but I think there are still some unresolved details

bentsherman · 2026-04-17T16:33:36Z

Fixed null-safety bug
Added unit tests for TypeHelper

jorgee · 2026-04-21T17:00:52Z

+            return c;
+        if( type instanceof ParameterizedType pt )
+            return (Class) pt.getRawType();
+        throw new IllegalStateException();


Shouldn't include a message with a reason of the exception?

As far as I can tell this should never happen, so i just throw an illegal state exception to capture the stack trace, following the convention used by the rest of the codebase

jorgee

Added a couple of comments:

All tests are with List. It works in this case but not when using custom record types (e.g. List) as in the examples in the ADR and documentation.
I have also missed a test with a custom type that includes another custom type or List<Path> in an inner property.
Minor issue about an exception without a reason message

pditommaso · 2026-04-21T19:27:37Z

Some initial thoughts and a few concerns about the direction of this adr/implementation:

Embedding file parsing and type binding into the core runtime

CSV/JSON/YAML parsing and the type-binding rules (file-path columns, nullable handling, record construction) are being baked directly into the core. This is a substantial surface to commit to — CSV alone carries quoting, separator, encoding, and header-inference choices — and once it's in core, every edge case and format request becomes a Nextflow release decision.

It also sets a narrow precedent: users will reasonably expect the same typed-binding experience for other formats that are increasingly common in scientific and data workflows — Parquet, Iceberg, Arrow, Avro, and so on. Committing to CSV/JSON/YAML in core makes each of those future additions a core concern too. This feels like a responsibility that should sit behind a built-in function, factory, or a core plugin rather than in the runtime itself.

Schemas evolve — hand-maintaining them in Groovy is cumbersome

Input schemas (samplesheets, JSON Schema definitions, vendor/LIMS formats) are maintained upstream and change over time. Requiring authors to hand-transcribe them into Groovy record declarations creates a dual-maintenance problem with no tooling to catch drift, and it pushes schema ownership onto people who don't necessarily write Groovy. As a developer I'd rather be able to import a JSON Schema — or even a CSV file — and have it bind automatically to a Nextflow record.

Alignment with existing workflow input definitions

More broadly, I think it's worth exploring how the params model could fit into the existing sub-workflow input definition rather than standing alongside it as a parallel construct. There may be an opportunity here to enable workflow composition and chaining along the same lines we already do for sub-workflows — this deserves a closer look before the two mechanisms diverge further.

I think these points are worth a deeper discussion before the params block becomes a hard commitment. Happy to dig into any of them further.

bentsherman · 2026-04-21T19:28:21Z

One of the unit tests had List<Record instead of List<Sample> and so it wasn't capturing the error you encountered. That is now fixed

I also added a more complex unit test that converts a record containing a list of records

bentsherman · 2026-04-21T20:02:21Z

CSV/JSON/YAML parsing and the type-binding rules (file-path columns, nullable handling, record construction) are being baked directly into the core. This is a substantial surface to commit to — CSV alone carries quoting, separator, encoding, and header-inference choices — and once it's in core, every edge case and format request becomes a Nextflow release decision.

CSV is indeed a tricky one. For this reason, the documentation states that the CSV must have a header row and use a comma separator, so that there is no ambiguity. I would be open to only supporting JSON in core, but CSV samplesheets are so ubiquitous that I figure it's better to support it

JSON/YAML don't have these problems though, I think the behavior for them is quite clear

It also sets a narrow precedent: users will reasonably expect the same typed-binding experience for other formats that are increasingly common in scientific and data workflows — Parquet, Iceberg, Arrow, Avro, and so on. Committing to CSV/JSON/YAML in core makes each of those future additions a core concern too. This feels like a responsibility that should sit behind a built-in function, factory, or a core plugin rather than in the runtime itself.

I agree. I would like to make this functionality extensible, with only CSV/JSON/YAML supported in the core runtime (or even just JSON). We should be able to do that in 26.10. Right now I'm focused on the basic functionality

Input schemas (samplesheets, JSON Schema definitions, vendor/LIMS formats) are maintained upstream and change over time. Requiring authors to hand-transcribe them into Groovy record declarations creates a dual-maintenance problem with no tooling to catch drift, and it pushes schema ownership onto people who don't necessarily write Groovy. As a developer I'd rather be able to import a JSON Schema — or even a CSV file — and have it bind automatically to a Nextflow record.

These record types will need to be defined in pipeline code either way, in order to model the data as it flows through the pipeline. So it makes more sense for these record types to be the source of truth and to generate JSON schemas from the record types.

External tooling like Seqera Platform can provide pre-launch validation, e.g. comparing a pipeline input schema with upstream dataset schema and reporting any mismatch in the launch form.

More broadly, I think it's worth exploring how the params model could fit into the existing sub-workflow input definition rather than standing alongside it as a parallel construct. There may be an opportunity here to enable workflow composition and chaining along the same lines we already do for sub-workflows — this deserves a closer look before the two mechanisms diverge further.

It's a bit late since we already made the params block stable in 25.10. Even so, what you are saying mostly comes down to syntax -- whether to define an entry workflow as a trio of params / workflow / output or as a subworkflow with some implicit behavior. Either way, you still need to be able to translate between Nextflow data types and the external world.

bentsherman · 2026-04-21T20:26:31Z

To put it concisely -- loading a list of records from a CSV/JSON/YAML file in the params block is the mirror image of saving an output channel to a CSV/JSON/YAML index file in the output block.

I'm open to which file formats are supported in core, but I think both directions should be supported and, ultimately, both directions should be extensible so that people can integrate with arbitrary file formats via plugin.

jorgee

Last changes look fine to me

pditommaso

I think both directions should be supported and, ultimately, both directions should be extensible so that people can integrate with arbitrary file formats via plugin.

I'm not sure this should go in the core, I'm not ok to merge

bentsherman · 2026-04-24T14:00:03Z

@pditommaso did you read my previous comment? I don't understand why you are blocking this

pditommaso · 2026-04-28T13:53:28Z

Let me clarify more this, I'm totally supporting static types for params and compile-time validation.

My concerns is essentially about overloading and embedding data binding and type conversion into core functions/factories like fromList when showing

// after (CSV, JSON, or YAML)
ch_samples = channel.fromList(param.samples)

This functionality is some extend is not directly related to this pr/adr and in my view should be provided via one or more new factory methods or a core plugins dedicated data loading/import/mapping

bentsherman · 2026-04-28T16:28:56Z

In that example when you call channel.fromList(params.samples), the params.samples is already converted to List<Sample>. So the fromList is only converting List<Sample> to Channel<Sample>, same as before.

The "magic" happens in the params block when the user supplies a file path instead of a literal list. The file parsing bit was originally implemented in #6675 and this PR adds the record type validation

I can remove the file parsing piece for now if you want more time to investigate it before we commit to it

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Addressed feedback

bentsherman · 2026-04-28T20:25:41Z

Updated based on our discussion:

Remove automatic file loading that was added in Allow collection-type params to be loaded from files #6675 . I will move this feature and associated ADR discussions to a separate PR which we can consider after 26.04
Keep the record type validation for consistency with the type system

bentsherman requested a review from jorgee April 10, 2026 21:39

bentsherman requested a review from a team as a code owner April 10, 2026 21:39

pditommaso self-requested a review April 14, 2026 09:44

jorgee reviewed Apr 14, 2026

View reviewed changes

bentsherman requested a review from jorgee April 17, 2026 16:32

jorgee reviewed Apr 21, 2026

View reviewed changes

Comment thread adr/20250825-workflow-params.md Outdated

jorgee requested changes Apr 21, 2026

View reviewed changes

bentsherman requested a review from jorgee April 21, 2026 19:28

bentsherman added this to the 26.04 milestone Apr 22, 2026

jorgee approved these changes Apr 23, 2026

View reviewed changes

pditommaso previously requested changes Apr 23, 2026

View reviewed changes

bentsherman mentioned this pull request Apr 26, 2026

Fix runtime type reflection in nf-lang #7077

Merged

bentsherman force-pushed the record-collection-params branch from 876373f to e00b634 Compare April 27, 2026 14:25

bentsherman linked an issue Apr 28, 2026 that may be closed by this pull request

List parameters improperly deserialized #7082

Open

Support validation of collection params with record types

74c3a9c

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman force-pushed the record-collection-params branch from 418581d to 74c3a9c Compare April 28, 2026 17:58

bentsherman removed a link to an issue Apr 28, 2026

List parameters improperly deserialized #7082

Open

Remove automatic file loading

c3b1ed9

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman requested a review from a team as a code owner April 28, 2026 19:18

bentsherman mentioned this pull request Apr 28, 2026

List parameters improperly deserialized #7082

Open

bentsherman merged commit 297172c into master Apr 28, 2026
25 checks passed

bentsherman deleted the record-collection-params branch April 28, 2026 20:25

bentsherman mentioned this pull request Apr 28, 2026

Allow collection params to be loaded from a file #7084

Draft


		- Replacing `nextflow_schema.json` -- while the `params` block addresses many of the same needs, existing pipelines that use a JSON Schema should not be required to migrate. A native integration with `nextflow_schema.json` can be explored in the future.

		- Supporting nested params -- the `params` block only supports a flat list of params. Nested params can still be used in the config, but they do not have first-class support at this time.

Conversation

bentsherman commented Apr 10, 2026

Uh oh!

netlify Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nextflow-docs-staging ready!

Uh oh!

jorgee commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorgee Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

bentsherman Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

jorgee Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

bentsherman Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

bentsherman commented Apr 17, 2026

Uh oh!

jorgee Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bentsherman Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorgee left a comment

Choose a reason for hiding this comment

Uh oh!

pditommaso commented Apr 21, 2026

Uh oh!

bentsherman commented Apr 21, 2026

Uh oh!

bentsherman commented Apr 21, 2026

Uh oh!

bentsherman commented Apr 21, 2026

Uh oh!

jorgee left a comment

Choose a reason for hiding this comment

Uh oh!

pditommaso left a comment

Choose a reason for hiding this comment

Uh oh!

bentsherman commented Apr 24, 2026

Uh oh!

pditommaso commented Apr 28, 2026

Uh oh!

bentsherman commented Apr 28, 2026

Uh oh!

bentsherman commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify Bot commented Apr 10, 2026 •

edited

Loading

jorgee commented Apr 14, 2026 •

edited

Loading